Clustering in Trees: Optimizing Cluster Sizes and Number of Subtrees
نویسندگان
چکیده
This paper considers partitioning the vertices of an n-vertex tree into p disjoint sets C1, C2, . . . , Cp, called clusters so that the number of vertices in a cluster and the number of subtrees in a cluster are minimized. For this NP-hard problem we present greedy heuristics which differ in (i) how subtrees are identified (using either a best-fit, good-fit, or first-fit selection criteria), (ii) whether clusters are filled one at a time or simultaneously, and (iii) how much cluster sizes can differ from the ideal size of c vertices per cluster, n = cp. The last criteria is controlled by a constant α, 0 ≤ α < 1, such that cluster Ci satisfies (1− α2 )c ≤ |Ci| ≤ c(1 + α), 1 ≤ i ≤ p. For algorithms resulting from combinations of these criteria we develop worst-case bounds on the number of subtrees in a cluster in terms of c, α, and the maximum degree of a vertex. We present experimental results which give insight into how parameters c, α, and the maximum degree of a vertex impact the number of subtrees and the cluster sizes. Communicated by G. Liotta: submitted November 1999, revised August 2000. 1. Hambrusch’s research supported in part by the National Science Foundation under Grant 9988339-CCR. 2. Lim’s research supported in part by Korea Science and Engineering Foundation under Contract No. 98-0102-07-01-3. S. E. Hambrusch et al., Clustering in Trees, JGAA, 4(4) 1–26 (2000) 2
منابع مشابه
Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...
متن کاملThe Subtree Size Profile of Bucket Recursive Trees
Kazemi (2014) introduced a new version of bucket recursive trees as another generalization of recursive trees where buckets have variable capacities. In this paper, we get the $p$-th factorial moments of the random variable $S_{n,1}$ which counts the number of subtrees size-1 profile (leaves) and show a phase change of this random variable. These can be obtained by solving a first order partial...
متن کاملHybrid Algorithm for Noise-free High Density Clusters with Self-Detection of Best Number of Clusters
Clustering is a process of discovering group of objects such that the objects of the same group are similar, and objects belonging to different groups are dissimilar. A number of clustering algorithms exist that can solve the problem of clustering, but most of them are very sensitive to their input parameters. Minimum Spanning Tree clustering algorithm is capable of detecting clusters with irre...
متن کاملClustering XML Documents Using Closed Frequent Subtrees: A Structural Similarity Approach
This paper presents the experimental study conducted over the INEX 2007 Document Mining Challenge corpus employing a frequent subtree-based incremental clustering approach. Using the structural information of the XML documents, the closed frequent subtrees are generated. A matrix is then developed representing the closed frequent subtree distribution in documents. This matrix is used to progres...
متن کاملFurther Analysis on the Total Number of Subtrees of Trees
When considering the total number of subtrees of trees, the extremal structures which maximize this number among binary trees and trees with a given maximum degree lead to some interesting facts that correlate to some other graphical indices in applications. Along this line, it is interesting to study that over some types of trees with a given order, which trees minimize or maximize this number...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Graph Algorithms Appl.
دوره 4 شماره
صفحات -
تاریخ انتشار 2000